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Abstract 

We describe how noise propagates through a network by calculat- 
ing the variance of the outputs. Using stochastic calculus and dynam- 
ical systems theory, we study the network topologies that accentuate 
or alleviate the effect of random variance in the network for both 
directed and undirected graphs. Given a linear tree network, the vari- 
ance in the output is a convex function of the poles of the individual 
nodes. Cycles create correlations which in turn increase the variance 
in the output. Feedforward and feedback have a limited effect on noise 
propagation when the respective cycles is sufficiently long. Crosstalk 
between the elements of different pathways helps reduce the output 
noise, but makes the network slower. Next, we study the differences 
between disturbances in the inputs and disturbances in the network 
parameters, and how they propagate to the outputs. Finally, we show 
how noise correlations can affect the steady state of the system in 
chemical reaction networks with reactions of two or more reactants, 
each of which may be affected by independent or correlated noise 
sources. 
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1 Introduction and Overview 



Noise is ubiquitous in nature, and virtually all signals carry some amount of 
random noise. In addition, even the simplest systems can be represented as 
a set of smaller subsystems interconnected with each other. There have been 
numerous studies on how noise affects specific functions (e.g. [1], [ ] and 
references therein), but none of them has looked at how noise propagates in 
general networks, and how various network structures impact the robustness 
of each system to noise. Although there is evidence that it may degrade 
the system performance, noise is sometimes necessary for specific functions 
[3]. Networks in which information is transmitted through a means that is 
accessible by all the individual units of the network are prone to unwanted 
crosstalk interactions between various unrelated subsystems [4]. Both noise 
and crosstalk have been treated as something unwanted in engineering sys- 
tems. However, they do not seem to be a problem in the cell, or in natural 
biological systems in general, despite the large number of noise sources, the 
variety of molecules, and the intricate patterns of interactions. 

We present a new method to quantify the noise propagation in a system, 
and the vulnerability of each of its subsystems. We use results from graph 
theory and control systems theory to quantify noise propagation in networks, 
and use them to evaluate various network structures in terms of how well they 
filter out noise. We study how crosstalk can help suppress noise, when the 
noise sources are independent or correlated. We show that perturbations 
that depend on the state of the system (for example, feedback loops that 
are prone to noise or noisy degradation rates) have a fundamentally different 
effect on the system output, compared to noise in the inputs. Finally, we 
study noise propagation in chemical reaction networks where all reactants 
may introduce noise, and analytically find that noise correlations may affect 
the expected behavior of such systems. 

2 Background 

2.1 General Response of Linear Systems 

In this section, we will briefly revisit some basic tools from control systems 
theory. Consider a linear time invariant system with impulse response h(t, s) 
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[ ]. The general form of the output when the input signal is u(t) is 




y {t) = / h(t,s)u(s)ds (1) 

J — oo 

where h(t) is the impulse response of the dynamical system. A system with 
m inputs, n states and p outputs can be written in the form 



(2) 



where the dimensions of matrices A, B and C are n x n, n x m and p x n 
respectively. The output of the system at time t when the input is an impulse 
applied at time s is 

h{t,s) = Ce A ^B (3) 
and equation (1) can be simplified to 

y(t) = C f e A{t - s) Bu(s)ds. (4) 

J — oo 

When the network in question is comprised of elements whose outputs 
obey linear time-invariant differential equations, we can also find the Fourier 
transform of the network output: 

r+oo 

H(u) = / h(t)e-^ l dt, (5) 



where h(t) = h(t, 0) is the impulse response of the system and u = litf is 
the angular frequency. If the system is causal (h(t) = for t < 0), then the 
expression above can be simplified by replacing the lower limit of the integral 
with zero. 

When the input is a stochastic process, its output will be a stochastic 
process as well. We are interested in the mean, the variance, and occasionally 
the higher central moments of the system output once the system has reached 
its equilibrium state. The mean E[y(i)] and the variance Y[y(t)] of the output 
y(t) will be denoted as E[y] and Y[y] respectively: 

E[y] = lim E[y(t)] and V[y] = lim Y[y(t)]. (6) 

t— >oo t—>oo 
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If we know the impulse response of the system, the mean of the output vector 
can be expressed as 



E[y{t)\ = E 



h(t — s)u(s)ds 
h(t-s) -E[u(s)]ds 



(7) 



where in the last equation we have interchanged the expectation with the 
integration operator, assuming that the input functions are non-pathological, 
and the quantities are finite, such that all the integrands are measurable in 
the respective measure space (Fubini's theorem, [6]). In what follows, we will 
always assume that all such conditions are satisfied. 

The covariance matrix of the outputs, when applying the same input is 

Y[y(t)} = E[y(t) ■ y T (t)} 

h(t - r) (E [u(r)u T (s)] - E [u(r)] E [u T (s)] ) h T (t - s)drds. 

(8) 

If in addition u(t) = for t < 0, then according to equation (6), 

Y[y] = lim / ! h(t - r) (E [u(r)u T (s)] - E [u(r)} E [u T {s)]) h T (t - s)drds. 
Jo Jo 

(9) 



oo J — oo 



2.2 Wiener Process 

In this subsection, we will be describing some elementary properties of the 
Wiener process that will be used in the following analysis. Let £ n , n G N be 
a sequence of independent identically distributed random variables with zero 
mean and unit standard deviation. Their sum is 

n 
k=l 

We now define the piecewise constant function 

W t = lim ^M. (11) 

n->oo Jn 
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According to the Central Limit Theorem, the distribution of W t is indepen- 
dent of the distribution of the sequence of £ n , as long as they have finite 
variance, are identically distributed and independent of each other. The 
random process Wt is normally distributed with variance equal to the time 
interval it which it is measured: 



W t = lim =^ W t ~M(0,t). (12) 

n-Kx y/nt yfn 

The difference of two sums S& — S a with a < b has the same distribution of 
the random variable Sb- a and result 

W b - W a ~ W b - a 0<a<b. (13) 

Lastly, the random variables Wb — W a and Wa — W c are independent when 
0<a<b<c<d, since the respective sums consist of independent random 
variables. More details on the properties of the Wiener process can be found 
in [6]. 



2.3 Graph Theory 

A graph (also called a network) is an ordered pair Q = (V, £ ) comprised of 
a set V = V(G) of vertices together with a set £ — £{Q) of edges that are 
unordered 2-element subsets of V. Two vertices u and v are called neighbors 
if they are connected through an edge ((u,v) G £) and we write u — v , 
otherwise we write u-j-v. The neighborhood M u of a vertex u is the set of 
its neighbors. The degree of a vertex is the number of its neighbors. The 
order N of a graph is the number of its vertices, N — |V|. A graph's size 
(denoted by m = |£|), is the number of its edges. We will denote a graph 
Q of order N and size m as Q(N,m) or simply Q^^ m - A path is a sequence 
of consecutive edges in a graph and the length of the path is the number of 
edges traversed. The distance between two vertices u and v, usually denoted 
by d — d(u,v), is the length of the shortest path that connects these two 
vertices. A full cycle is a cycle that includes all the vertices of the network. 
A graph is connected if for every pair of vertices u and v, there is a path 
from u to v. Otherwise the graph is called disconnected. We will be focusing 
exclusively on connected graphs, because every disconnected graph can be 
analyzed as the sum of its connected components. A tree is a graph in which 
any two vertices are connected by exactly one path. A path graph is a tree 
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with two or more vertices that has two vertices with degree 1, while all other 
vertices have degree 2. A thorough treatment of the graph theory notions 
used in this article can be found in [7]. 

3 White Noise Input 

In the state space, when the parameters of the system are deterministic and 
the input consists of a deterministic and a random component (white noise), 
then the system (2) is defined by the stochastic differential equation: 

{dx = Ax ■ dt + B(u t dt + E t dW t ) 
(14) 
y = Cx, 

where dW t = Wt+dt — W t is the standard vector Wiener process in the time 
interval [t,t + dt) and u% is a deterministic input. We will denote the value of 
a function / at time t as f(t) or f t interchangeably. The matrix H t consists 
of nonnegative entries, possibly time-varying, each of which is proportional 
to the strength of the corresponding disturbance input. Note that the only 
difference with the system (2) is that now the infinitesimal state difference 
dx depends not only on the current state and the deterministic input, but 
also a random term dW t ~ A/"(0, dt). 

It should be noted that the fraction dW t /dt does not exist as dt — > 0, so 
dividing both sides of equation (14) by dt would not make sense. But this 
notation also helps us to intuitively understand the effect of randomness in 
the system, when we know how the state of the system is affected by the 
randomness in the inputs. It also helps us to easily generalize these results 
when the randomness is a product of many noise sources as we will see in 
the last section. 

The different Wiener processes may be correlated with each other but 
since each input may consist of a weighted sum of all of the different processes 
through multiplication by matrix S t , the analysis is simplified if we assume 
that they are independent. 

The output of the system is the superposition of the deterministic output, 
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and the response to the random input: 



y(t) 



I h(t- s)(u(s)ds + E s dW s ) 

J — oo 

f h{t - s)u{s)ds + f h(t - s)E s dW s . 

J — oo J — oo 



(15) 



The expected value for the output, according to equation (7) will be 
E[y{t)] = I h(t- s)E [u(s)ds + Z s dW s ] 

J — oo 

h(t — s)u(s)ds, 



t 



(16) 



since Brownian motion is a martingale [6]. 

Applying equation (8) when the input is white noise, the covariance ma- 
trix can be written as 



Y[y] = lim Y[y(t)] 



t— >rOO 



lim I j h{t-r)E[dW r i: r J:JdWj]h T {t- s). 

J — oo J —CO 



(17) 



But since the inputs are assumed to be white noise processes, the covariance 
among all of them is nonzero only if they take place during the same interval, 
and in that case, the covariance is proportional to the length of this interval. 



/ / h(t-r) (z r Vfo6(r - s)VdsY%} h T (t-s) 
■> — x •/ — x (18) 



oo J — oo 

t 

h(t-s) ■ V(s) ■ h T (t- s)ds, 



where V(s) = S s Ej is the covariance matrix of the input random vector. 
For the linear time invariant system (2) and white noise inputs of constant 
variance V(s) is a constant matrix, and we can write 

Y[y] = f (Ce A{t ~ s) B) ■ V ■ {Ce A{t ~ s] B) T ds 

7 , + oo v (19) 

= C e Ax BVB T e ATx dx)C T . 



The mean and the variance of the output signal in the steady state can 
be written as a function of the Fourier transforms of the input signal and the 
network transfer function. From equation (7) 



E[y(t)}=E 



h(t — s)u(s) 



ds r 

(20) 



h(t) * E [u(t)} 



where f(t) *g(t) denotes the convolution of two functions fit) and g(t) given 
that it exists. 

When the input is constant with time, the expected value of the input is 
constant as well (E[w(t)] = fi x ) and the last expression can be simplified to 

r+oo 

E[y] = fi x / h(u)du = /i x H(0). (21) 
Jo 

If the input itself is not known, but its frequency content can be estimated, 
we can find the variance of the output using Parseval's theorem: 

V[y] = E[y ■ y T ] = lim / y(t)y T {t)dt 

J — OO 

/+oo p+oo 
\Y(f)\ 2 df= / Y(f)-Y*(f)df (22) 
-oo J — oo 

H(f)X(f)X*(f)H*(f)df. 



— oo 
+oo 



The formula above is useful if we know or we can estimate the various 
frequencies of the input random processes. More generally, if we know the 
autocorrelation function of the random processes in the input, we may find 
the expected autocorrelation in the output, and then estimate the output 
variance. 



/+oo 
S y (f) cos(27r fr)df 
-oo 

/+oo 
\H(f)\ 2 S x (f)cos(2nfr)df (23) 
-oo 

/+oo / r+oo \ 

\H(f)\ 2 / R x (u) cos(27r/ u)du cos(27r/r)rff. 
-oo \J — oo / 
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We will be focusing on Wiener processes exclusively, because this is the 
most general approach for sums of random disturbances. The Central Limit 
Theorem shows that the sum of a large number of independent identically dis- 
tributed random variables with finite mean and variance always approaches 
the normal distribution (see also equation (12)). The only assumption in 
the case of additive disturbances is that the inputs at every time are sums 
of independent random variables of arbitrary distribution of finite standard 
deviation. This is a reasonable assumption in most settings. For example, in 
biology the Poisson distribution is frequently used to model random distur- 
bances [1]. The Poisson distribution can be well approximated by a Gaussian 
when the event rate is greater than 10 (see [8]), and the same can be said for 
small sums of Poisson random variables. When the input disturbance at each 
time is correlated with the disturbances during earlier times, the correlation 
structure can be emulated by passing white noise through a filter that pro- 
duces it. Also, in some applications, noise cannot be expected to have equal 
frequency content for all frequencies up to infinity. We can still use white 
noise as an input, which we can pass through a filter with zero response for 
all the frequencies outside the desired range. 

4 Tree Networks 

Tree networks are a special case of networks where there is a unique path 
among every pair of vertices. In other words, there are no cycles, which 
makes the analysis of such networks easier. Many natural networks have 
been found to be locally tree- like [9]. When analyzing the behavior of a 
network around an equilibrium point, or if the network is linear, then the 
analysis can be significantly simplified. Since there is a unique path from any 
vertex to another, it suffices to analyze path networks, which consist of all 
their vertices connected in series. For each output, the total response of the 
system is the superposition of the signals caused for all the individual inputs. 
First, we will show that in the case of random signals, the order of the nodes 
in the network does not matter in the case of linear pathways. Then, we will 
find the variance of a linear path graph assuming that every node is a first 
order filter. The result can easily be generalized for the case of arbitrary tree 
graphs. Finally, we are going to find the optimal placement of poles so that 
the noise suppression is maximized. 
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4.1 Output Variance of Linear Pathways 



Lemma 1. The noise response of a linear pathway is independent of the 
relative position of its nodes. 

Proof. Without loss of generality, we can assume that the linear pathway 
has one input and one output. Otherwise, since the system is linear, we can 
repeat the process each time considering only the respective subtree. Under 
the last assumption, the output is the state of the last node, and all inputs 
affect only the first node. From equation (22): 



Y[y] = / H(f)X(f)X*(f)H*(f)df 

J — OO 

/ + OC 
H{f){Xx{f) + ■■■ + *»(/))(*?(/) + • • • + X* n (f))H*(f)df 
-CO 

n n p-\-oo 

= E E / X k (f)X* m (f)H(f)H*(f)df 



k=l m=l 

n n 



oo 
+oo 



= E E / ^(/)^(/)(^(/) • • • • • h N (f))(K(f) hi(f))df 

k=l m=l J -°° 

M M „ +oc N 

=ee / x k (f)x* m (f)H\h n (f)\ 2 df. 

(24) 

It is evident that we can interchange the transfer functions inside the product 
in the integral, without changing its value. □ 

Assume that we have a linear pathway such that the system is linear, 
described by the equation (2) where the dynamical and input matrices are 





-di 


. 


. 







1 







A = 


h 


-da • 


. 





B = 





c T = 









h ■ 


. 



















. 


■ /jv 


-d N 









1 



For simplicity, we assume that there is only one noise source and only one 
output, but since there are no cycles, there is a unique path from each node 
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to every other, which means we can use the result for a linear pathway 
repeatedly, in order to find the total variance. The variance is independent 
of the deterministic input that is applied to the pathway, since the system is 
linear. 

Using equation (19), and after performing all calculations, the variance 
at the output will be 

/ \ 



v out = m /„ 



,u=l 



N N 

EE 



N N 

k=l m=l 



(4 + d m ) [ [ (4 - 4) ] [ (d m - 4) 

\ a=l,a^=k b=l,b^m ) 

(25) 

The expression above holds even if there exist two vertices a and b such 
that their reaction rates are equal, according to the next Lemma. 

Lemma 2. The output variance of a linear pathway does not depend on the 
difference of any of the reaction rates. 

Proof. We pick two rates d x and d y and show that the Y out does not depend 
on their difference. If we denote 

1 

T h , m = jf jj , (26) 

(4 + dm) Yl (4 - 4) Yl {d m ~ 4) 

a=l,a^k b=l,b^m 

the difference 4 — d y appears only in the terms T XjX , T Xjy , T yiX and T Vty . Their 
sum T x _ y is equal to 

T-x—y T-x,x T x ,y ~\~ T y x -\- T y y 

1 



+ 



We set 



24(4 - dy) 2 Y[ (4 - 4) 24(4 - 4) 2 Y[ (d y - 4) 

s+x,y s^x,y (27) 

2 

(4 + dy)(dy - 4) 2 \\ (dy - 4) 

s^x,y 

P*=\{ (dx ~ 4) and P y = Yl (dy - 4) (28) 

s^x,y s^x,y 
11 



so that sum above can be written as 



_ d X ((j x + dy)Pl + dy(d X + dy)Pl ~ Ad X dyP X Py 

2d X dy(d X + dy){d y -d x yPlP y l ' l j 

Expanding the nominator of T x - y and grouping the relevant terms together: 



T 

J- ', 



d 2 P 2 -\- d d P 2 + d d P 2 + d 2 P 2 - Ad d P P 

u x r x * u, x u, y r x * u x u y r y t a y r y ^u x u y r x r y 



2d X dy(d X +d y )(d y -d x ) 2 P 2 P 2 

{d 2 x P 2 - 2d x d y P x P y + d 2 y P 2 ) + d x d y {P 2 - 2P x P y + P 2 ) 

2d X dy(d X +d y )(d y -d X ) 2 P 2 P 2 

{d x P x d y Py) -\- d x d y (P x Py) 

= 2d X dy{d X +dy){dy-d X ) 2 P 2 P 2 



(30) 



It is easy to see that both terms in the nominator of the last fraction have a 
factor of order {d y — d x ) 2 , and the Lemma is proved, and the fraction does 
not depend on the square difference (d y — d x ) 2 . □ 

4.2 Optimization of Linear Pathways 

Lemma 3. Assume that the same noise source is applied to two different 
pathways with impulse responses hi{t) and h 2 (t) respectively. The covariance 
of the signals in their output will be equal to 

C(r)= \imE[ yi (t)y 2 (t + r)} 



(31) 

ti 1 {r)h 2 {r + r)dr. 

'o 

Proof The two outputs y\{t) and y 2 (t) are equal to 



poo 

= / hi(r)h 2 (r + r)dr. 
Jo 



Vi(t)= I h l {t-x)dW x and y 2 (t)= f h 2 (t - y)dW y (32) 

J— oo J— oo 

where W t is the Wiener process that drives both systems simultaneously. 
Taking the expected value of the product of the first and a delayed version 
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of the second, 



C(t) = lim E 

t-^oo 



lim 

t— >oo 



oo 

t rt+T 



I h l (t-x)dW x - I h 2 (t + t - y)dW y 

J -co J —oo 

h^t -x)h 2 (t + r- y)E [dW x dW y ] 
lim / h±(t — s)h 2 (t + r — s)ds 

t— >oo / 

•J —oo 

poo 

/ hi(r)h 2 (r + r)dr. 
Jo 



— CO J —CO 

t 



(33) 



□ 



Corollary 1. Assume that noise from a single noise source with standard 
deviation a enters a network, and propagates through N independent path- 
ways to reach the output. If the impulse response of each of the independent 
pathways is hi(t), h 2 (t), . . . , hjy(t) respectively, the mean of the output y will 
be zero, and its variance equal to 



N 



V out = a 2 / ^2a k h k (x) dx. 



(34) 



,fc=i 



Proof. The output vertex will receive a weighted sum of the outputs of the 
two independent pathways 



N 



-(t) = ^2a k y k (t). 



(35) 



k=l 



Its expected value is equal to zero at all times: 



E[z(t)} = E 



N 



,k=l 



N 



= E [«*?/*(*)] 

k=l 

= J2 ak h k (t - x)aE[dW x ] 

I. 1 J —CO 



(36) 



k=i 
= 0. 
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The variance is equal to: 



V„ = lim V„(t) = \imE[z 2 (t)} 

t— >OD t— >OD 



lim E 

t->oo 



2 ( 

f ^a k h k {t - x)dW x \ ■ ( j ^a k h k (t -y)dW y 

J -°° k=i J v-°° k=i 



lim 

t— >oo 



/ ctkh k (t — s) I ds = a 2 I y a k h k (x) ] dx. 
fi J Jo \U J 

(37) 
□ 

Suppose we have a linear pathway with each element representing a single- 
pole linear filter, and we need to pick the position of the poles such that the 
variance in the output is minimized. The next lemma shows an easy way to 
find the pathway if all its vertices are identical and subject to the symmetric 
constraints. 

Definition 1. A symmetric multivariable function / : W 1 — y R is a function 
for which f(x) = f{it(x)) where ir{x) is an arbitrary permutation of the input 
vector x. 

Lemma 4. Assume that a symmetric multivariable function f : M n — > K. 
is nowhere constant and has a sign definite Hessian matrix. Then it has a 
unique extremum under symmetric constraints, such that all the elements of 
the input vector x are equal. 

Proof. Since the Hessian has the same sign everywhere, the function / is 
strictly convex or strictly concave. We will assume that / is strictly convex, 
noting that the proof is the similar when / is concave. Assume that the 
extremum of the function / is equal to /*, and the argument that achieves 
this is x*. Further assume that min(x*) = m and max (a;*) = M are the 
minimum and maximum elements of the vector x* respectively. Since / is 
symmetric, 

/(m, M, x* 3 , . . . , x* n ) = f(M, m,x* 3 ,..., <) = f (38) 
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where the arguments still satisfy the symmetric constraints. But since / is 
strictly convex, every convex combination of these values will be 



f{a, b, x 3 ,... x n ) < tf(m, M,x 3 ,..., x n ) + (1 - t)f(M, m, x 3 , 



tf 
f*. 



(39) 



Generalizing the last argument, it is straightforward to see that 

f(xi, X2, ■ ■ ■ x n ) = f* for every m < xi, X2, ■ ■ ■ x n < M. (40) 

Therefore, f(x) needs to be constant in that area, which contradicts the 
assumption that the function has sign definite Hessian. □ 

When the constraints are convex but not necessarily symmetric, then we 
can use the Lagrangian to find the optimal parameters. Coming back to the 
linear pathway network, and assuming that the input is white noise, if the 
poles of the different nodes are placed at a%, o 2 , ■ ■ ■ , ajv, the total variance in 
the output is equal to (see equation (22)): 



V out (a 1 , a 2 , . . . a N ) 



1 

2^ 
1 

27 



+oo 



oo 

+ 0O 



1 


2 


1 


2 


1 


ju + di 




ju + a 2 




ju + a N 



u 2 + a\ us 2 + a| uj 2 + a 2 N 



du. 



(41) 



The function Y out is convex with respect to all its arguments 0,1,0.2... Ojv j 
as an (infinite) sum of products of convex functions. Consequently, it has a 
unique minimum under convex constraints. 
The Lagrangian of the function for Y ou t is 



£(oi,a 2 , ...a N ) 



2tt 



_ oc u 2 + a\ uj 2 + a| u 2 + a 2 N 



dou—Xg(ai J 02, ... , Ojv). 

(42) 



Differentiating with respect to a^, under the Leibnitz integral rule: 



dC 

da k 



1 

27 



+00 



-2a fe 



1 



(co 2 + a 2 ) 2 u 2 + a 2 N 



du = X 



dg(a 1} . . .,a N ) 
da k 



(43) 

for every k. Differentiating with respect to all the parameters will give us iV 
equations, and we have one more equation by requiring g(ai, . . . ,a N ) = 0. So 
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we can solve the system of iV + 1 equations and N + 1 unknowns A, a±, . . . a^, 
which is guaranteed to have a unique solution as all functions are convex. 

In conclusion, we can find the unique minimum of the variance of a linear 
pathway, when each node is a single pole linear filter with real negative 
poles. Given that a linear tree network with independent noise inputs can 
be decomposed to many linear pathways, this method can be applied to any 
arbitrary network without cycles. 

5 Correlation, Feedforward and Feedback Cy- 
cles 

In a serial pathway where each vertex acts as a filter, the output at each 
node has a different frequency content as the noise propagates through the 
network, being filtered at each step. The variance at each node is decreasing 
as we move further from the noise source, as is shown in Figure 1. As 
the serial pathway becomes longer, the input and the output become less 
correlated since their distance increases. In addition, every node changes 
the phase of its inputs, which also contributes to the decreased correlation. 
Therefore, applying negative feedback or setting up a feedforward cycle can 
only have a measurable effect if the cycle length is relatively small. Figure 
2 shows the covariances and correlations among the vertices of two simple 
linear pathways, one unidirectional one bidirectional, as they are depicted in 
Figure 1. 

Cycles can significantly increase the effect of noise in the system. There 
are two reasons for this: First the noise can now reach more vertices since 
the average distance among nodes decreases, and second, every node now 
receives the same disturbance from at least two different paths, and the two 
signals are correlated, contributing to larger variance. An example is shown 
in Figure 3, where we compare the average variance of two systems whose 
only difference is the connection between the first and the last node. Both 
networks receive the same inputs, but in the cycle network, the variance is 
much larger. The result of the noise is even more pronounced when there is 
correlation among the noise inputs to different nodes. 

The effect of cycles on the output noise can be reduced if we make sure 
that each independent pathway also changes the phase of its input by differ- 
ent amounts. Different phases in the output (for at least a relatively large 
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Figure 1: Variance of the output of a unidirectional and a bidirectional serial path- 
way as a function of the pathway length. All nodes are assumed to be identical 
single-pole filters. In the unidirectional pathway, each node is affected only by the 
node immediately preceding it, whereas in the bidirectional pathway each interme- 
diate vertex is receiving input from the node preceding and the node succeeding 
it. The bidirectional pathway is much more efficient in filtering out noise. The 
variance for both pathways decreases with the pathway length. The bidirectional 
pathway has variance very close to zero even when it is relatively short. 



frequency spectrum) will ensure that the various frequencies partially can- 
cel each other, reducing the output variation. When a pathway significantly 
reduces the frequency content, or has small gain for most frequencies, then 
correlations do not play a significant role. This behavior is clearly shown in 
Figure 4 for a unidirectional cycle and in Figure 5 for a bidirectional cycle. 
Phase shifts in a pathway are equivalent to time delays, as we will see in the 
next section. 

Similarly, negative feedback carefully applied to a network contributes to 
better disturbance rejection. When the disturbance is white noise, the effect 
of feedback is smaller as the feedback cycle gets longer. 
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(a) Unidirectional Pathway Covariance (b) Unidirectional Pathway Correlation 
Matrix Matrix 




(c) Bidirectional Pathway Covariance (d) Bidirectional Pathway Correlation 
Matrix Matrix 



Figure 2: Covariance and correlation among all pairs of nodes in a linear pathway. 
Every square (x, y) in the matrices above corresponds to the value of their corre- 
lation R x , y (T = 0) of nodes of distance x and y from the origin, < x, y < N — 1. 
The larger the correlation, the darker the respective square. As the distance \x — y\ 
among the nodes increases, their covariance and correlation decreases. The covari- 
ance among nodes of the same distance in the unidirectional pathway decreases, 
and the correlation among them increases towards the end. The covariance of the 
nodes in the bidirectional pathway is essentially zero within a small distance, and 
the correlation is larger even when the distance is relatively large. 
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Figure 3: Average variance of all nodes in a network in a cycle as compared to 
an identical network without the feedback loop. Every node has a noise input 
which is then spread through the network. The average variance of all the nodes 
for both the cycle is normalized by the variance of the respective serial pathway. 
The variance cycle is always much larger than the variance of the simple serial 
pathway when the noise inputs for each node are uncorrelated (bottom left). The 
ratio becomes even larger when the inputs are correlated (bottom right). 



The correlation and covariance among vertices decreases with distance 
and the variance of each node decreases as the length of the pathway in- 
creases. Furthermore, as we move towards the end of the pathway, the co- 
variance of nodes of a given distance decreases but the correlation of nodes 
of a given distance increases. The last observation is easily justified taking 
into account that each new node introduces a virtual filter, and the output 
of nodes will tend to have very similar frequency content the more filters it 
has gone through. Moreover, from the Bode plot of a filter, we can easily see 
that for the frequencies that are not affected by the filter, their phase is also 
relatively unaffected, which does not decrease their correlation. 

The previous analysis hints to the fact that feedback cycles have limited 
utility when applied to long pathways. Figure 6 shows the variance of the 
output after we apply negative feedback to a linear pathway. The darkness 
of each element (m, n) of the upper triangular matrix shows the standard 
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(b) (c) 

Figure 4: A network consisting of a feedforward cycle and the corresponding noise 
strength in its output. If the nodes of the network have poles with relatively 
small absolute values, then the output variance may be larger than the variance 
in the intermediate nodes. A fixed number of identical nodes is divided into two 
pathways, whose output is combined in the output node. If the number of nodes 
is similar in both pathways, then their outputs are highly correlated, and when 
combined produce large random swings. This does not happen when the poles of 
each node have a large negative real part (right). In the first case, the poles are 
placed at a = — 1 whereas in the second the poles are place at a = —1.5. 

deviation of the pathway output when we apply feedback from node n to 
node m. As one would expect, the effect of feedback is directly proportional 
to the correlation between the source and target vertices. The same holds 
for feedforward loops, both positive and negative. 

In the case of negative feedforward loop, the variance in the output in- 
creases as the loop length increases. When the feedforward interaction is 
positive, the variance decreases at first, since the correlation among the dif- 
ferent states also decreases, but then goes up, partly because when it affects 
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(b) (c) 

Figure 5: Correlations increase the variance in bidirectional networks. If the out- 
puts of two pathways that are correlated are combined, then the output has rel- 
atively large variance. Here, a single output receives input from two pathways of 
different lengths, which consist of identical nodes. Bidirectional pathways filter 
noise very effectively as shown before, and the output variance is still small. 

a node towards the end of the pathway, it does not pass through successive 
niters, so the variance does not have the chance to decrease (see Figure 7). 

5.1 Delayed Feedforward and Feedback Cycles 

As one would expect, adding delay to the interactions among any nodes 
in a network driven by noise decreases their correlation, meaning that any 
feedforward or feedback cycles will have a smaller effect. The covariance 
of a white noise process with a delayed version of the same signal can be 
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(a) Feedback Topology Figure 
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(b) Output Variance 



Figure 6: A serial pathway with a unit feedback loop. The matrix on the right 
consists of squares (m, n), each of which represents the variance of the output when 
feedback is applied from node n to node m. The result of the feedback loop only 
depends on the distance d = \n — m\, and the variance decreases as the length of 
the feedback loop becomes smaller, and vice versa. 



Noise Output 



(a) Feedforward Loop 




5 10 15 5 10 15 



Loop Length Loop Length 

(b) Negative Feedforward Loop (c) Positive Feedforward Loop 

Figure 7: Output variance of a linear pathway when the input is white noise, 
and we add a negative (left) or positive (right) feedforward loop starting from 
the first vertex. For the positive loop, the variance is largest when we connect 
nearby vertices (large correlation) or we connect an early vertex to the end of the 
pathway, since it has a large variance that is transmitted directly to the output 
without being further filtered. 
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computed the same way as in equation (18): 
Y T [y] =pmE[y(t)y{t + r)] 

I h(t - r)^ r dW^\ (f h(t + r- s)^ s dW t 

\J — oo J \J — oo 

= lim / / h{t- s)H r E[dW r dWj] Y? s h T {t + r-s) 

J — oo J — oo 

/t rt+r 
/ h(t -r)E r \fdr5(s - r)VdsE a h T (t + t - s) 
■ooJ-oo 

= lim / hit - s)V s h T it + r - s)ds 

t— S>00 I 

J — OO 

= lim / hit - s)V s h T {t + r- s)ds. 

(44) 

If the system is causal, linear and time invariant, and the disturbance is white 
noise of constant strength added to the input, 

POO 

Y T [y]= h{u)Vh T {u + T)du. (45) 
Jo 

As a specific example, if the impulse response is h(t) = Ce At B and the 
covariance matrix is constant: 

/oo 
Ce As BVB T e^ AT C T ds 

/ /-oo \ (46) 

= C e As BVB T e sAT ds\e rAT C T . 



Note that the last equation is similar to equation (19), except for the ex- 
ponential delay term in the end. We assume that the dynamical matrix A 
has negative eigenvalues, otherwise the system is not stable. If the delay is 
t > 0, 
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(47) 



= II || - 

The matrix norm used here is the first order elementwise norm, since we 
are usually interested in the average variance of all parts of the network. 

N N 
i=l j=l 

If we only know the autocorrelation function of the disturbance, we can 
compute the output variance by moving to the frequency domain. 

n + OO 



/+oo 
S y (f) cos(2tt fr)df 
-oo 

/+oo 
S y (f) cos(2tt fr)df 
-oo 



— oo 
+oo 
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\H(f)\ 2 S x (f) cos(27r/r)d/ 

oo 

■y f+co / /»+oo \ 

= — / \C(juI — A)~ 1 B\ 2 I / R x (u) cos(uu)du ) cos(ur)duj. 

27T 7-oc \J-oo J 

(49) 

The shape of the autocorrelation function is a good indicator of how a feed- 
back or feedforward loop will affect the output variation. A correlation func- 
tion that quickly goes to zero as r increases shows that the feedback cycle 
will not change the variance of the output by a lot. Conversely, a random 
signal with a correlation structure can be easily filtered out by applying an 
appropriate feedback mechanism. 



5.2 Minimization of the Average Vertex Variance 

In a general network, signals are propagated from one node to its neighbors. 
Every vertex receives a filtered version of the noise signal, since every node 
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acts as a single pole filter. The pole is always real, and proportional to the 
degree of each vertex, if we assume that each node receives input proportional 
to the differences of concentrations among its neighbors and itself, or that 
nodes that interact with many others have proportionally large degradation 
rates. In this case, we can model the dynamics of a first order linear network 
through its Laplacian matrix. In such a network, the state of each node Xk 
follows the differential equation 



where a^ m > for every k, m G V. The Laplacian of a matrix has been 
used to model a wide range of systems, including formation stabilization 
for groups of agents, collision avoidance of swarms and synchronization of 
coupled oscillators [10]. It can also be used in biological and chemical reaction 
networks, if the degradation rate of each species is equal to the sum of the 
rates with which it is produced. In this section, we will model the dynamics of 
each network with its Laplacian matrix, where each node is affected by a noise 
source which is independent of all other nodes, but has the same standard 
deviation. Given that each vertex contributes equally to the overall noise 
measure of the graph, and since the noise entering each node propagates 
towards all its neighbors, we can use Lemma 4 to see that the degrees of 
the network vertices have to be as similar as possible (see also [11] and [4]). 
In addition, Figure 3 shows that the cycles need to be as long as possible 
in order to avoid any correlations of signals through two different paths. 
For longer cycles, the noise inputs go through more filters before they are 
combined. Moreover, the phase shift is larger for all their frequencies, which 
reduces their correlation. On the other hand, there are bounds on how long 
a cycle can be given the network's order and size. Networks with long cycles 
tend to have large radius and larger average distance, as shown in [12], which 
makes noise harder to propagate, having to pass through many filters. By the 
same token, networks with a small clustering coefficient will tend to be more 
immune to noise in their output, since these networks tend to create cliques or 
densely connected subnetworks [13], which will facilitate noise propagation, 
especially if the noise sources that affect the nodes are correlated, as shown 
in previous sections. A method to find these graphs is first to determine their 
degree sequence, and then determine which one has the largest average cycle 
length. This procedure can be simplified by working recursively, building 
networks with progressively larger order and size. 




(50) 
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Lemma 5. There is always a connected graph of order N and size m in 
which there are k vertices with degree d+1 and N — k vertices with degree d 
where 

2m 

d= and k = 2m - Nd. (51) 

_N j 

Proof. We will prove the existence of such a graph by starting with its degree 
distribution and, by successive transformations, convert it to a graph that 
is known to exist. Specifically, at each step we will remove one vertex along 
with its edges, repeating the process until we end up having a cycle graph. 
Assume that the degree sequence of the graph Q is as above, and we arrange 
the degrees of the vertices in a decreasing order. 



So — {d+ 1, d+ 1, . . . d + 1, d, d, . . . , d }. 

V , ' - 

k 



(52) 



vertices 



N — k 
vertices 



According to the Havel-Hakimi theorem [11], the above sequence is a graph 
sequence if and only if the graph sequence in which the largest degree vertex 
is connected to vertices 2, 3, d + 2 is also a graph sequence. The new graph 
will have a degree sequence of 



si 



{d + 1, d + 1, . . . d + 1, d,d,...d } 

k-d-2 N-k+d+1 
vertices vertices 



if d < k - 2 



{ d,d,...,d,d , d — 1, d — 1, . . . , d — 1} if d > k — 2. 

" v ' " v ' 

d- k + 2 



N + k-d-3 
vertices 



vertices 



(53) 



The key observation is that the transformation above preserves the property 
of degree homogeneity, in other words, in the new graph Q\ = Q\(N — 1, m — 
d+1), the minimum and maximum vertex degrees are 



dn 



m — d+1 
N - 1 



(54) 



and 



drain — d max <^mm + 1- 



(55) 
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Repeating the process, there will be a graph Q r with at least one vertex of 
degree d min = 1. It follows from the analysis above that the graph Q r will 
include either one or two vertices of degree d min = 1. If it has two vertices 
with degree one, it is the path graph. If it has only one vertex with degree 
one, its degree sequence is not a graph sequence. But this would mean that 
the sum of all the degrees is an odd number, which is not possible, since 
at every transformation, we remove 2d max from the sum of degrees. The 
graph Q T is a connected graph, and implementing the inverse transforms, we 
connect new vertices to an already connected network, which guarantees that 
the final graph is connected. □ 

For networks with a small number of vertices , we can find all graphs 
with the desired degree sequence, and among them, exhaustively search for 
the ones with the largest average cycle length that have the smallest average 
variance. For N = 6 nodes, all connected networks (with 5 < m < 15 edges) 
with most homogeneous degree distribution and longest average cycles are 
shown in Figure 8. 



Figure 8: All connected networks of order N = 6 and size 5 < m < 15 and 
with minimum output variance. We assume that every vertex is affected by an 
independent noise source. In addition, each vertex acts as a single pole filter. The 
total noise of the network is measured as the average of the variances of all nodes. 

To summarize this section, positive correlations increase the output vari- 
ance, and cycles create correlations that make the system more prone to 
random inputs. The longer the cycles, the smaller their effect. The immu- 
nity to noise is increased when pathways with the same output introduce 
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different phase shifts, so that the different noise contributions cancel each 
other at least partially. This result holds both for feedforward and feedback 
loops. When we have some convex constraint on the strength of the various 
filters, placing the poles, we can find the optimal placement such that the 
output noise is reduced. Specifically, for a linear network where all nodes 
act as single pole filters and the dynamics of the network are described by 
its Laplacian matrix, there is a systematic way to find the network with the 
smallest average variance. The optimal networks have homogeneous degree 
distribution, and cycles that are as long as possible. 

6 Crosstalk Reduces Noise In Pathway Out- 
puts 

6.1 Motivating Example 




— x — 

Figure 9: A simple circuit with two noise sources. The two resistors generate 
thermal noise, which is modeled as current sources in parallel to them. When the 
switch is open, the two circuits are independent. When the switch is closed, the 
noise in both outputs has smaller variance than before. 

Assume that we have a resistor without any external voltage source. If 
we measure the voltage between its endpoints, we will find that in any in- 
finitesimal frequency interval df there is thermal noise V t with 

E [V t ] = and E [V t 2 ] = AkTRdf (56) 

where R is the resistance. The above equation shows that the noise increases 
as temperature and resistance increase. We connect a capacitor in parallel 
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with the resistor, and measure the voltage between its endpoints. We are 
interested in the total amount of variance of the voltage in the output of 
the parallel combination of the resistor and the capacitor. When the switch 
is open, each of the two subcircuits operate independently, and the output 
variance for both of them is 
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If we close the switch, the output variance is 
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2C(C + D)(C + 2D) 
kT C + D 
~~C ' C + 2D' 



C(C + 2D) 



If the capacitor that connects the two subcircuits has capacitance D > 
and the two noise sources are uncorrelated, then both outputs have smaller 
variances. 

In biology, there are countless sources of noise, and the noise is often 
larger than the signal itself. It is possible that the cell needs to employ the 
same technique for reducing noise, distributing it among many different and 
unrelated components. Crosstalk between different elements of a biological 
network couples the behavior of different parts of the network, introducing 
more poles in the network dynamics, as we will see next. This is equivalent to 
introducing capacitances between random parts of an electrical network. The 
new system filters out noise much more effectively, but on the other hand 
may be slower to react to various inputs, so there seems to be a tradeoff 
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between how fast a network can respond to changes and how well it filters 
out noise. The next section studies the effect of crosstalk on the behavior of 
a small network. 

6.2 Crosstalk on Single Nodes 

We analyze the four simple subgraphs of Figure 10. 




(a) (b) (c) (d) 

Figure 10: Crosstalk topologies involving one network node, (a) A node without 
crosstalk interactions with white noise input having standard deviation equal to 
a. (b) A node with crosstalk interaction with one other node in the network, 
which also is affected by noise with standard deviation (. (c) Same as before, 
but we assume that both the crosstalk and the noise are increased, (d) Crosstalk 
interactions with many other nodes, each of which has an independent noise input 
of the same strength. See text for quantitative analysis of these subsystems. 



For simplicity, we may disregard any deterministic inputs, since we assume 
these are linear systems, and any deterministic inputs only affect the output 
mean, but not its variance. The stochastic differential equations for all the 
systems are shown next. 

System (a) obeys a simple stochastic differential equation, with one noise 
input, and it has no other interactions with any other parts of the network. 

dX = -aXdt + adW t . (59) 

We have found the solution to this equation in the first section, and the 
variance in the output is found is equal to 

V. = £ (60) 
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This is the trivial case without any crosstalk, and will be used for comparison 
to the performance of the other subnetworks. 

Subsystem (b) consists of one vertex that interacts with another node 
which may also be prone to other noise sources. Crosstalk is modeled through 
a new vertex in the network, with which the studied node exchanges flows. 
In chemical reaction networks for example, the species of interest X may be 
forming a complex Y with species /, whose concentration is supposed to be 
constant: 

X + I # Y. (61) 

/ 

We also expect X to have a constant degradation rate a. The equations for 
the concentrations of X and Y are 

dX = -(a + c)Xdt + fYdt + adW t 
dY = cXdt - fYdt + (dU t ^ ' 

and the output variation is 

Vb = 2a(a + c + ff 2 + 2a(a + c + f)^' (63) 

The next step is to see what happens if we increase the crosstalk intensity. 
We can distinguish two cases. The first is when there is crosstalk with one 
other node (Figure 66). In the chemical reaction network analogy, 

n-c 

X + A^Y. (64) 

n-f 

It is straightforward to find the new differential equations, and the variance 
in the output. 

dX = -(a + nc)Xdt + nfYdt + adW t 

dY = ncXdt - nfYdt + n(dU t ^ ' 

= a + nf 2 n 3 f 2 

c 2a{a + n{c + f)f 2a{a + n{c + f)y ' 1 ' 

Finally, we consider the case where one node has crosstalk interactions with 
many different nodes, each of which is affected by a different noise process 
(Figure 10(d)). The equations that the nodes obey are 



dX = -(a + nc)Xdt + nfYdt + adW t 
dY k = cXdt - fY k dt + (dUf 1 < k < n 



M , ( 67 ) 
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Amount of Crosstalk. Amount of Crosstalk 

(a) (b) 



Figure 11: Output variance as a result of noise input for a single vertex in the 
network in the existence of crosstalk interactions with other vertices, (a) Out- 
put variance as a function of the amount of crosstalk (concentration of crosstalk 
complex), when no additional noise is introduced. Crosstalk clearly mitigates the 
output variance. Also, having crosstalk with two independent nodes reduces the 
variance even more, compared to having a single crosstalk node, (b) Normal- 
ized output variance as a fraction of the variance when only one crosstalk node 
is present. Having many small sources of crosstalk is clearly better than having 
one strong crosstalk interaction. For the same amount of total crosstalk, dividing 
it among many nodes drives the output noise variance to zero as the number of 
nodes grows large. 



and the output variance can be computed as 

a + / 2 nf 2 

d = 2a(a + nc + ff + 2a(a + nc + fy ' (68) 

When no noise is introduced from the crosstalk nodes (£ = 0), crosstalk 
reduces the output variance. Figure 11 compares the last three cases, as the 
strength of crosstalk interactions among the nodes increases. The crosstalk 
strength in this case is quantified by the ratio 

r. = (69) 

which is equal to the concentration of the crosstalk product Y in equation 
(62) in the absence of degradation rates and noise inputs. It is shown that 
distributing the crosstalk among many nodes (equation (68)) decreases the 
effect of noise noticeably more compared to the single node case. This is 
even more pronounced when we normalize by the variance in the base case 
(equation (63)). 
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Figure 12: Normalized variance of the output when the crosstalk introduces ad- 
ditional noise. Having strong crosstalk interactions with one single node increases 
the variance because noise propagates easily. When crosstalk is distributed among 
many nodes, the variance may be smaller or larger than before, depending on the 
strength of the interactions. This is because having crosstalk interactions with 
many other vertices introduces a proportional amount of noise. 

When crosstalk introduces additional noise, it may increase the variance 
in the output of any given node if crosstalk is not strong enough to make up 
for the introduced noise (Figure 12). 

6.3 Parallel Pathways 

We consider two pathways with crosstalk among more than one of their 
nodes. We distinguish two cases, when the two pathways have different or 
the same outputs. In the first case, since the two outputs are independent, 
it is easier to reduce the noise variance in both of them, by "exchanging" 
their noise through each node, assuming that the different noise sources are 
independent. When the output is the same, there is little reduction in the 
output variance from crosstalk, since every disturbance eventually reaches the 
output, and is combined with other correlated versions of the same signal, 
as shown in Figure 13. The variance reduction in this case is caused by 
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the increase of the effective pathway lengths, since they follow on average a 
longer path towards the output. 



Noise 1 Output 1 Noise 1 




Amount of Crosstalk Amount of Crosstalk 

Figure 13: Output variance when crosstalk is present among all stages of two 
different pathways for various pathway lengths, when their output is different 
(left) or the same (right). The output variances are normalized by the variance 
of a pathway without crosstalk. We assume that every stage of the pathway has 
some noise input. A small amount of crosstalk can help reduce the effect of noise in 
the output, but more crosstalk does not help filtering out the noise of the system. 
Crosstalk has a much smaller effect when the two pathways have the same output. 
Although it reduces the variance of the intermediate nodes, it creates correlations 
among them, that in turn increases the variance in the output. 



6.4 Crosstalk Modeling: Direct Conversion and Inter- 
mediate Nodes 

Suppose we have a simple decomposed system: 

dY 1 = -aYxdt + adU t , 

70 

dY 2 = -aY 2 dt + adW t . 

The two outputs of the system are completely independent, since they do 
not interact in any way, and therefore are uncorrelated. The variance of each 
output is: 
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2 r+oo -I 2 

v[y 1 ] = < ^ = — / „ j u = — . (7i) 

L J y 2tt y_ 00 w 2 + a 2 2a V ; 

The system is symmetric, thus Y[Y\] = VfYy = cr^. If there is crosstalk, then 
the different states of the system are correlated. If we model crosstalk as a 
positive conversion rate from one state to another, with the conversion rates 
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Figure 14: Output variation for each node in a system of N nodes, when there 
are crosstalk interactions among every pair of nodes. The variance has been nor- 
malized by the corresponding variance without crosstalk. Each node is identical, 
and receives an independent noise input of the same intensity. When the number 
of vertices increases, the noise is distributed among all the nodes, thus the output 
variance is reduced. 
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being equal among every pair of states, the 2— state system above becomes: 



dYi = -(a + c)Y x dt + cY 2 dt + adll t 
dY 2 = -(a + c)Y 2 dt + cY x dt + adW t . 

The variance of each of the outputs now becomes: 



(72) 
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where tin and h 2 i are the impulse responses of the first node when the input is 
an impulse response to the first and second node respectively. The symmetry 
is preserved, so V[Y"i] = ipy — Y[Y 2 }. The variance when crosstalk is present 
(c > 0) is always smaller than the initial variance of the outputs. Generalizing 
the equations above for iV nodes (see Figure 14), we find that 
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and as a result, 
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(74) 



(75) 



which tends to zero as iV becomes large. 

Alternatively, we can model crosstalk interactions as two species being 
converted to an intermediate complex, as has been done in the previous sec- 
tions. A very simple example of a chemical reaction network which demon- 
strates this type of behavior is 



Fx + Y 2 # Z. 



(76) 



Crosstalk is defined by the presence of the last reaction. We are interested 
in the variance in the concentration of the output products A and B, which 
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are directly affected by the variance of Y\ and Y 2 . The two pathways will 
interact through an intermediate vertex. The system can be written as 

dY x = -aY x dt - cYiY 2 dt + fZdt + adll t 
dY 2 = -aY 2 dt - cY x Y 2 dt + fZdt + adW t (77) 
dZ = cY x Y 2 dt - fZdt. 

We assume that there is a new "crosstalk vertex" Z among each pair of 
original vertices. After linearizing around an equilibrium point (Yi,Y 2 , Z), 
these equations become 

dY x = -(a + cY 2 )Y 1 dt - cY x Y 2 dt + fZdt + adU t 
dY 2 = -(a + cY x )Y 2 dt - cY 2 Y x dt + fZdt + adW t (78) 
dZ = cY 2 Y x dt + cY x Y 2 dt - fZdt. 

We find that this network is now more capable of reducing the effect of noise 
in the output (Figure 15). 

7 Multiplicative Noise 

There are cases where the noise intensity is proportional to a state of the sys- 
tem. In biological networks for example, the degradation of various proteins 
depends on specific enzymes, whose concentration may be subject to random 
fluctuations. This makes the degradation of a protein prone to noise whose 
source is independent of the protein concentration, but makes the rate at 
which it degrades proportional to it. The noise intensity is also proportional 
to the state of the system when a state is autoregulated, either with posi- 
tive or negative feedback, where the rate at which the concentration of that 
particular state changes is subject to random noise. We will call this type of 
noise multiplicative, because it is multiplied by the state of the system. As a 
specific example, consider a gene that is regulated by a single regulator [14]. 
The transcription interaction can be written as 

P -> X. (79) 

When P is in its active form, gene X starts being transcribed and the mRNA 
is translated, resulting in accumulation of protein A at a constant rate b. The 
production of A is balanced by protein degradation (by other specialized 
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Figure 15: Comparison of the noise in the output of a simple network with two 
different implementations of crosstalk, direct conversion or forming a new complex, 
as described by equations (72) and (77). 

proteins) and cell dilution during growth with rate a. A differential equation 
that describes this simple system is 



If there is noise in the concentration of the aforementioned degradation pro- 
teins, or the cell growth, the rate a t is not constant, but it consists of a 
deterministic component, and a random component. We will now show that 
noise in the production rate b t has a fundamentally different effect in system 
behavior compared to the effect of noise in the degradation rate a t , because 
the latter is multiplied by the concentration of the protein itself. We will first 
study the homogenous version of the differential equation (80), and then we 
will add the constant production term. Ignoring the constant production 
term, and multiplying by dt, equation (80) becomes 



dX 



(80) 





(81) 
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After adding a random component in the degradation rate, the last equation 
becomes 

dX = (-atdt + a t dW t )X, (82) 

where W t is the regular Wiener process and dW t represents the noise term. 
Note that the degradation rate and the noise intensity are allowed to be time- 
dependent. We will first find the differential of the logarithm of X using Ito's 
lemma. We will again require that all the input functions are continuous and 
non-pathological, so that we can always change the order of taking the limit 
and the expectation operator. We will additionally assume that all integrals 
are finite, so that we can also change the order of integration. The technical 
details mentioned above are covered in more detail in [ ] and [15]. 

We apply Ito 's lemma on the logarithm of the random variable X, which 
obeys equation (82): 

f(X,t) = log X(t) (83) 
and applying Ito's lemma, we get 

d\og(X) = df(X,t) 

= + ^- - ^-^ (a 2 t X 2 dt 2 - 2a t a t X 2 dtdW t + a 2 X 2 dW 2 ) 
X 2 X 

-atdt + a t dW t - ^a 2 dW^J + X 2 (a\dt 2 - 2a t a t dtdW t ) . 

(84) 

The last two terms can be neglected, since dt 2 = O (dt) and dt ■ dW t = O (dt) 
as dt — > 0. On the other hand, as dt becomes small, 

lim dW 2 = E[dW 2 ] = dt. (85) 

dt— >0 

Applying the rules above to equation (84), 

log = - / ( a s + la 2 ) ds + a t W t . (86) 



Ao Jo \ 2 
We can now solve for X(t): 

X(t) = X e- ti(*sH°s)ds . e a t w t _ ( g7 ) 
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The above derivation is valid only when the equilibrium state (concentration) 
is equal to zero and we start from a state X ^ 0. If the rate a and the noise 
strength o are constant, it simplifies to 

X(t) = X e~( a+1 ^ 2 ) t -e rjWt . (88) 

When the equilibrium is positive (which is the case for most systems), the 
following differential equation is more relevant: 

dY = bdt + (-adt + adW t )Y. (89) 

One way to view the terms on the right hand side of equation (89) is that 
the concentration of species X depends on a deterministic input, and is 
regulated by a negative feedback mechanism which is subject to random 
disturbances. It has been shown in [1(3] that when feedback is also noisy, 
there are fundamental limits on how much the noise in the output can be 
reduced, because there are bounds on how well we can estimate the state 
of the system. In [16] the authors focus on discrete random events (birth- 
death processes) as the source of noise, and the result is that feedback noise 
makes it harder to control the noise in the output. We will also show that in 
our setting multiplicative noise results in larger variance than additive noise 
of equal strength, and in the next section we will show it propagates in a 
cascade of linear filters. 

Using Ito's lemma once more, and the solution to the homogeneous equa- 
tion, we find that the solution to the nonhomogeneous case is 

Y{t) = Y X(t) + bX(t) [ X~\s)ds 

J ° t t (90) 

= y o e~^( a " + 5 <T 2) dn . e ff tW t _|_ / e -f*(au+^l)du -e (7tWt-cr s W 8 ^ s 

Jo 

where X(t) is the solution of the homogeneous equation (87) with initial 
condition X(t = 0) = 1. 

If the initial state is equal to zero (or when t is large), and the all the 
parameters are constant, then we can simplify the last expression as 

Y(t) = b / e -( a +^ 2 )« • e aW -du. (91) 
Jo 

Note that the form of the last equation is fundamentally different from the 
response of linear systems to input noise, because here the Wiener process 
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input depends on the same time variable as the kernel of the integral. In 
other words, the output is not a convolution of the impulse response of the 
system with the input. In order to see how the noise propagates through the 
network, and given that we cannot use the solution (22), it is helpful to find 
the correlation of two versions of this stochastic process, so that we find its 
frequency content. 

As a first step, we will compute the correlation of the exponential of 
Brownian motion. The expected value is 



The expected value of the square of the exponential Wiener process is 



E [Z t ] = E [e aWt ] 




(92) 



E [Z 2 ] = E [e Wt ] 




(93) 



Combining the last two equations: 



cr| t = Var [Z t \ 



E[Z f 2 ] -(E[Z t ]) 2 



e 2*H _ e *H 



(94) 




The expected value of Y(t) in equation (91) can now be computed: 



E [Y(t)] = b / e~( a+1 ^ 2 ) u ■ E [e aW -] du 



Jo 




(95) 



which means that 



Y = lim E[Y(t)} 



b 



(96) 



a 



41 



As one would expect, it the same as when the system is completely deter- 
ministic. Next, we need to compute the covariance of two realizations of the 
random process Z t : 

Cov [Z„ Z t ] = E [Z s ■ Z t ] - E [Z s \ • E [Z t \ 

= E [e aW 'e aWt ] - E [e° w °] • E [e° Wt ] 
= E [e^AtgtrWsvtj _ e ^ 2 (s+t) 

= ^ e 2aWsAt e <T(W aVt -WsAt)^ _ e ^ 2 (s+t) ( 97 ) 
= j^trWWj . £ ^(W sVt -VF 3At )j _ e ^ 2 (s+t) 
_ e 2a 2 sAt _ e ±a 2 (sVt-sM) _ e \cr 2 {s+t) 

where we follow the standard notation s At = min(s, t) and s V t = max(s, t). 
Combining all the equations above, we can find the correlation for the geo- 
metric Brownian motion: 

R{s,t) = Corr [Z s ,Z t ] 
Cov [Z s , Z t ] 



D 2cr 2 sAt . e |cr 2 (sVt-sAt) _ e \cr 2 {s+t) 



(98) 



^e° 2s (e° 2s - l)^e° H (e° H - 1) 

e <r 2 s\Jt _ 

We now define the covariance and correlation of two such processes with time 
lag r in the equilibrium state as: 

C(t) = lim C(t, t + t) and R(r) = lim R(t, t + r). (99) 

t— >oo t— >oo 

Applying this definition to the general correlation formula of geometric Brow- 
nian Motion, 



R(t) = lim 



e 



(T 2 t _ I 



~pT~ (100) 



T """ 



(t+r) 
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So the correlation is exponentially decreasing as a function of the time lag. 

We can now follow the same procedure in order to find the correlation of 
the stochastic process defined by equation (91). 

Its second moment is equal to 

E[Y 2 (t)] = b 2 f f e -^) {x+v) • E [e aW 'e° w *] dxdy 
Jo Jo 

= b 2 f [\-( a+ ^) (x+y) e 2a2xAy e^ 2ixVy - xAy) dxdy 



o Jo 

t PX 



b 2 I I e-(. a+ ^) {x+y) e 2 ° 2y e^ 2(x - y ^dxdy (101) 



o Jo 



+ b 2 f [ e< a+ ^ ix+y) e 2 ° 2x e l ^y-^dxdy 

Jo Jx 

2 (all- 2e- at + e K- 2a +° 2 )) + (-1 + e - at ) a 2 



a (2a 2 - 3aa 2 + a 4 ) 
where we have assumed that all integrals are finite, which means that the 

2 

rate a has to be greater than the input variance As t goes to infinity we 
can ignore all the decaying exponentials. 



oo if a < ?L 



lim E[Y 2 (t)} = < ' (102) 



t— >co 



— h ^r- if a > 4- 



In what follows, we will only be interested in the behavior of the system 

2 

when a > because it only makes sense to compute the correlation when 
the standard deviation is finite. 

Based on equation (102), the standard deviation (when it is defined) is 
equal to 

b 2 a 2 a 2 , . 

^ - ow^y = ra r • (103) 

The standard deviation is proportional to the average value of Y, since 
the larger the value of Y, the larger the strength of the disturbance. 



7.1 Multiplicative Noise Through a Low-Pass Filter 

Assume that a pathway consists of two nodes. The first one is affected by 
multiplicative noise, and it is used as an input to the second node. We first 
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analyze a system where each state has a single real pole, and later on we will 
generalize it for an arbitrary number of poles. The equations of the system 
are 

dX = cdt + (-fdt + adW t )X 

dY = bX - aY. ^ ' 

Combining the forms for the multiplicative noise and the output of a single 
pole filter, 

Y(t) = bce' at jf e as Qf e~( f+ ^) u e aW ^du^j ds. (105) 
The mean is equal to: 

E[Y{t)\ = bce- at f e as ( f e^ f+ ^) u E [e aW -] dv) ds 



= bce- at e as ^jT e~ fu dv)j ds (106) 

be (a-ae~ ft + (-1 + e~ at ) f) 
~ a(a-f)f • 

The last equation also holds when a — /, and we can find the expected value 
by finding the limit as / — > a. Letting the time t go to infinity, 

be 

E[Y] = lim E[Y(t)} = — (107) 

t^oo a j 

which is exactly the same as an equivalent system without any noise. 
The second moment is 

E[Y 2 } = b 2 c 2 e- 2at f e ar dr f e as ds f f e^+^+^E [ e < w «+ w v)] dxdy. 
Jo Jo Jo Jo 

(108) 

We break the integral above in five parts, in order to compute the expected 
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value inside it: 



^E[F 2 (t)] = jf earrfr ^ eaSrfs / (/ e- fx e- fy e a2y dy)dx 

jT e -f x e - fy e a2x dy) dx 



Jo Jr 

+ / e ar dr / e as ds / ( / e'^e'^e^dy ) dx (109) 

Jo Jo Jo VJo J 

+ / e ar dr / e as ds / / < fx t ; "r''''V/// 

Jo J0 Jo VJx / 



+ / e ar dr / e as ds / ( / e~ fx e- fy e a2y dy ) dx. 



'0 Jo 

After performing all the algebraic calculations, 

b 2 c 2 



E[Y 2 } = HmE[F 2 (t)] = V a2 (110) 



,2 



given that the second moment is finite, which happens when f > \- The 
variance is 



W] = b 2 c \ 9 ° _ 9V (111) 



r 2 

a 2 f 2( 2 J_ a 2y 

We can write the above equation as a constant times the variance of the first 
state: 

The variance of K is fundamentally different from the variance in the case 
when white noise is added directly to the input, in which case, it would be 
equal to 

Vo[Y] = ^al. (113) 

The time evolution of the variance is shown in Figure 16. When the noise 
is multiplicative, it takes longer for the variance to settle to its steady state 
value, which is also an indication that the output variance consists of lower 
frequencies than in the case of additive noise. 
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Figure 16: Evolution of the output variance of a single pole filter when the input is 
affected by additive and multiplicative noise respectively. The system with additive 
noise has less variance in the output compared to the one with multiplicative noise. 
Also, in the case of geometric noise, the variance takes more time to settle to its 
equilibrium value. 



More generally, if we pass the output of the multiplicative noise through 
an arbitrary linear filter with impulse response h(t) then the output is denned 
as the convolution of the impulse response and the input: 



The mean is 
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The variance is equal to 



Y[Y(t)]=E[Y"(t)}-(E[Y(t)}) 2 

<*t pr ps py 



pt pr ps py 

c h(t- r)dr / h(t - s)ds / / e~ f{x+y) e a2x dxdy 
Jo Jo Jo Jo 

pt pr ps pr 

+ c h{t-r)dr / h(t-s)ds \ \ e- f{x+y) e a2y dxdy 

Jo Jo Jo Jy 

+ c [ h{t-r)dr [ h(t - s)ds [ [ e~ f(x+y) e a2y dydx 

Jo Jr Jo Jo 



+ c I h(t- r)dr I h(t - s)ds / / e- f ( x+ ^e a x dydx 

Jo Jx 



r ps 

2„ 



c 2 / rt 



P 



1 - e- fs )h(t - s)ds 



For example, if the filter has one pole at —a with a > 0, then h(t, s) = 
e -a(t-s) ^ we can ver jfy th a t the m ean and the variance are equal to the ones 
found in equations (107) and (110). 

If we have n identical single-pole filters in series, with the same pole at 
—a, with a6l, and their input is multiplied by b, then the mean is 



E[Y] = lim ^ / (1 - e-*) ( * ^ e^ds 
^ f Jo (n-l)\ 



;ii7) 



b n c 

and the variance is equal to 

The above results show how variation that enters the system through 
noisy degradation rates affects the output of a given pathway. For example, 
in the two-step cascade 

X -> Y . . 

119 

Y -> Z K J 

described by (104), species Y is affected by multiplicative noise, and then is 
used as an input to the next reaction that produces Z. The second reaction 
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acts as a first order linear filter, and the noise propagates to the pathway 
output Z. The analysis can be used for any system that can be described 
by linear differential equations. If a linear time invariant system is described 
by (2) then, if there is noise in the input u or its input matrix B, then we 
can consider noise a new additional input as in equation (14), and solve it 
accordingly The same holds for the off-diagonal elements of the dynamical 
matrix A. But noise in the diagonal elements of A is multiplicative noise, and 
needs to be considered separately from all other noise sources, and it leads 
to qualitatively different behavior than the previous kinds of input noise. 

8 Noise Propagation in Chemical Reaction 
Networks 

In this section, we will examine how noise propagates in general linear chem- 
ical reaction networks. Noise in chemical reaction networks that do not in- 
volve bimolecular or higher order reactions has been studied extensively (see 
for example [17]) and chemical reactions have also been analyzed as analog 
signal processing systems [18]. In this section, we will study reactions where 
two or more reactants are noisy, and their disturbances may be correlated 
with each other. 

8.1 Motivating example 

Consider the following reaction: 



Further assume that the concentration of X and Y is subject to random 
white noise fluctuations around a deterministic mean value: 



and Z degrades with a rate proportional to its concentration. The corre- 
sponding stochastic differential equation is 



dZ = (X Y - aZ t )dt + X a Y dW t + Y a x dU t + o x a Y d\U u W t ] (122) 



X + Y -> Z. 



(120) 



X t = X + a x dU t 
Y t = Y + a Y dW t 



(121) 
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where U t and W t are standard Brownian motions. Equation (122) is a natural 
generalization of the case where we have only one or more noise terms that are 
added to the deterministic differential equation. In all stochastic differential 
equations so far, we multiply the deterministic factors that contribute to 
the infinitesimal change in the state of the system by dt, and then we add 
the noise terms. When we have a product of two noisy inputs, we will 
first consider the noiseless case, and then add all the noise terms, and their 
products as well. In equation (122) the deterministic term is equal to X Y 
and the noise terms that are added are equal to X t — Y t — X Y . The term 
dU t dW t = d[Ut, W t ] is the differential of the quadratic covariation process of 
U t and Wt- If the two processes have correlation p, then 

d[U t ,W t ] = P dt. (123) 

Simplifying the last expression for dZ, 

dZ = (X Y - aZ t )dt + d(X t Y t ) 

= (X Yq + p<Jx<Jy — aZ t )dt + X aydW t + Y axdU t 

which is the familiar Ornstein— Uhlenbeck process with two noise sources. 
The final expression for the concentration of Z is 

Z{t) = -(X Y +pa x a Y )(l-e- at )+a x Y [ e a ^dU s +a Y X [ e a(t ^dW s . 
a Jo Jo 

(125) 

As the effect of the initial conditions diminishes, the mean is 

Z = lim E[Z(t)} = -(X Y + pcrxcry) (126) 
and the variance is equal to 

V[Z] = lim V[Z(f)l = Y o° 2 x + x2 o° 2 y + 2X o Y oP°x°y , (127) 

t^oc 2a 

An important consequence of correlations in the input noise (p ^ 0) is that 
the mean is different from the case where there is no noise, even if both noise 
terms in (121) have themselves zero mean. If the correlation is negative, the 
mean is lower and vice versa. In addition, the variance is larger when there 
are positive correlations in the two input noise terms, as expected. When the 
correlation is negative, the two noise processes partially cancel each other, 
resulting in lower variance. 
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8.2 General Reactions 



We can generalize the above results to general reactions of the form 
a 1 X 1 H h a N X N ->■ b{Y x + h b M Y M 



(128) 



where each of the elements of the left-hand side is assumed to be a random 
variable that consists of a deterministic mean X k and a standard white noise 
process dW^ multiplied by the standard deviation of its concentration. 



X k (t) = X k + a k dW i 



(k) 



1< k < N. 



(129) 



The concentration of the product Yj is described by a stochastic differential 
equation: 



N 



N 



( 



dY, 



bjJlX^-fjY^dt + bj^k 
( 



u=l 



k=l 



N 



\ 



\ n =l / 



dW i 



(k) 



(130) 



N N 



\ n Xu p^ dt + ° ^ 



k=l m=l 



u=l 
\Uy^k,m 



The last equation is derived by using Ito's box rule, and the fact that higher 
order products of Wiener processes have variance that tends to zero faster 
than dt as dt — > 0. As in the bimolecular case, we multiply the noiseless 
input by dt, as in the corresponding ordinary differential equation, and then 
we add all the noise terms, and their products. 
The mean (disregarding initial conditions) is 



Eft] = f 

h 



N 



N N 

IT Xu + °" fe0m 



u=l 



k=l m=l 



( « \ 


\ 


n x. 


Pk,m 


u=l 
\Uy^k,m / 


I 



(131) 



which is different from the case when there is no noise, if there are correlations 
among the noise terms. The last equation clearly shows that noisy inputs 
can have an effect in the average of the concentration of the output, even if 
their mean is zero. The amount by which they shift the mean depends on 
their own variances, their correlations, and the product of concentrations of 
all other reactants. 
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The variance is equal to 



VKI = ft 



N 



N 



k=l 



u=l 



N 

2p km o k o m o 2 k j J X;/ 

u=l 

u^k,m 



\ 



k<m 



(132) 



/ 



As before, positive correlations increase variance, negative correlations 
reduce it, and the extent by which the correlations affect it depends on the 
concentrations of the other species in the reaction. 



8.3 Reactions With Filtered Noise 

Suppose we have the following simple reaction: 

X 1 +X 2 + --- + X n ^Y (133) 

where X 1 . . . X N fluctuate around an average value, but the noise has already 
passed through a linear filter. In this case, we can write the equation that Y 
satisfies as an ordinary differential equation: 

dY = -aYdt + Y[ (^X k + a k h k (t- s)dW^j dt (134) 

where once again dWf is the standard Wiener process corresponding to 
species k. Expanding the last equation, 

N \ N N 



dY= (flX u -aY)dt + ±a k l[X u dt f h k {t - s)dW* 

\u=l J k=l u=l ^° 

u=£k 

N N N t t 

+ J2J2 ak(Tm II Xudt / / h k (t-x)h m (t-y)dW^cW r 
i Jo Jo 



k=l m=l u=l 

Uy^k,m 

+ O (dt) . 

(135) 

We have omitted all the terms whose order is larger than dt as dt — > 0, 
gathering them under the term 0(dt). By using Ito's box rule again, we 
can replace the products of Wiener processes by their correlation times the 
infinitesimal time interval dt. 
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N \ N N 



dY = ( n - aY ) dt + yi ak n * udt f hk{t - s ^ dw s 

\u=l J k=l u=l ^° 

u=£k 

N N N t ( 136 ) 

+ 2_^2^cr k a m Yl X u dt p km h k (t - x)h m (t - x)dx 

k=l m=l u=l J° 

u=£k,m 

+ O (dt) . 

Note that the second sum of integrals is deterministic and does not depend 
on any Wiener process. Setting 

N N N r t 

f(t) =^2^2<T k <7 m Yl X u dt I p km h k (t - x)h m (t - x)dx 

k=l m=l u=l J° 

u=£k,rn 

N N 

c N = Y[X u , a k = a k Y[X U and ( 137 ) 

U=l U=l 

uj^k 

q k (t) = a k [ h k (t-s)dW?, 
Jo 

the solution to the last differential equation (with zero initial conditions) is 

Y(t) = CJV (1 - e~ at ) + / e- a{t - u) f(u)du + Y^ / e' ^ q k (u)du. (138) 

Jo k=1 Jo 

More generally, if the differential equation for the output has impulse re- 
sponse git), and initial condition Y , 



Y{t) = Y g(t) + c N [ g(t-u)du + [ g(t - u)f(u)du + V / g(t - u)q k (u)d 
Jo Jo k=1 Jo 

(139) 

where all terms except for the last sum are deterministic. The last equation 
nicely decomposes the factors that drive the output Y(t). The first term is 
the effect of the initial condition, the second term denotes the effect of the 
mean value of the inputs, the third results from the noise correlations of the 
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inputs, and the last term corresponds to the sum of the random fluctuations 
of all input sources. 

If the output of reaction (140) receives inputs that are affected by both 
filtered and unfiltered disturbances, then we can use the same methods to 
find the mean and standard deviation of the output. We will analyze the 
case where we have two inputs, one of each type, since the generalization to 
an arbitrary number of inputs is straightforward. Suppose that the chemical 
species Y depends on species X\ and X 2 



where the inputs X\ and X 2 are defined by the following differential equa- 
tions: 



X x + X 2 -> Y 



(140) 




(141) 



and 



X 2 {t) =X S + a 2 dW s 

where U t and W t are standard Wiener processes. 
The stochastic differential equation for Y is 



(142) 



dY = (X X X 2 - aY)dt + a 2 X x dW t + a x X 2 dt \ h{t - s)dU s 





(143) 



(X X X 2 + ph^a x a 2 - aY)dt + a 2 X x dW t + a x X 2 




since 




pdt if s = t 







otherwise. 



(144) 



The output is equal to 



Y(t) = Y e- at + {X X X 2 + p/ioai<7 2 )(l - e' a *) + a 2 X x 



e~ a{t - s) dW s 



(145) 
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The mean is 

E[Y] = - (X ± X 2 + p/io<7K7 2 ) (146) 

(X 

which differs from the noiseless case by the last term, which is proportional 
to the correlation and the standard deviation of the noise inputs. Similarly, 
the variance is found to be equal to 

¥[Y(t)} = Vt(t) + V 2 (t) + V 12 (t) (147) 

where 

J e a ^ (J h(s - u)h(r - u)du\ drds, (148) 
V 2 (t) = 4^(1 - e- 2ot ) (149) 

and 

ft / ftAy \ 

V 12 (t) = po x a x X x X 2 ^ e- a ^ U e~ a ^h{y - x)\ dy. (150) 

The first component V\{t) is the variance because of the noise in the first 
input dU t , V 2 (t) the variance because of noise in the second input, and the 
last term V\ 2 (t) is the variance emanating from their correlation. 

When the inputs X\ and X 2 in (140) both have a filtered multiplicative 
noise component, then the differential equation becomes 

dY = -aYdt+^XtXx e- iXl+ ^ )x e aiU *dx^ {x 2 X 2 jf e- {X2+ ^ )y e aiWy dy^ dt. 

(151) 

In order to account for the possibly nonzero correlation between processes 
U t and Wt, we write each of them as a sum of two uncorrelated standard 
processes: 

U t = aA t + Vl - a?B t 

\ (152) 

W t = bA t + VT^¥c t . 

The processes A t , B t and Ct have correlation zero, and p = ab is the correla- 
tion between U t and W t : 

- 1 < a < 1 , -1 < b < 1 and - 1 < p < 1. (153) 
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We are interested in finding the mean and variance of Y. First, we will 
compute the expected value of the product of the two exponential Wiener 
processes U t and W t . 

~ e ai(aA x + VT^?B X ) e <r 2 (bA y + VT^WCy ) ' 



E [e^e^y] = E 

= E [e^i^+b^j . E 



E 



_ e ^(aa 1 +ba2) 2 xAy e ^((aa 1 ) 2 5 x + (ba2) 2 S y ){xVy-xAy) e ^al(l-a 2 )x e ^a^l~b 2 )y 

(154) 

where S denotes the Kronecker delta with 5 X = 5(x > y) and 5 y = 5(y > x). 
The expected value of the input of the differential equation is 



E[u(t)] = E 



X x / e- {Xl+ ^ )x e aiU *dx ) (X 2 / e'^^e^dy 



AiAaXiXs / e 



(Ai+^)x 



J e -(^+4)vE [e nU 'e nW *] dy^j dx 



AiA 2 X!X 2 / e~ Xlx f / e - {X2 - paia2)y dy ) dx 



+ \ 1 \ 2 X 1 X 2 / e 



-(\\+po\<J2)x 



e- X2V dy I dx 



x x e~ tXl ((1 - e^ 2 ^- tX2 ) Ax + (-1 + e tAl ) {pa 2 a 1 - A 2 )) 

' " (po"20"i - A 2 ) (-pa 2 ai + Ai + A 2 ) 

- - e- tX2 ((1 - e tX2 ) po 2 a x - (l - e tX2 ) A x - (l - e pta2(71 - tXl ) A 2 ) 
+ AiX]X 2 



(pa 2 o-i - Ai) (pa 2 ai - Ai - A 2 ) 



(155) 



where we assume that 



(To 



Ai > f , A 2 > 2 



Ai + A 2 > po\(j 2 . 



(156) 



The inequalities above guarantee that the inputs have finite variances, as 
shown in equation (102). In the equilibrium state, 

Ai + A 2 



lim E \u(t)\ = X X X 2 - r 

(Ai + A 2 - po\(J 2 ) 



The output average is then equal to 



E[Y] = lim E [Y(t)} 

t— >oo 



X\X 2 



Ai + A 2 



a (Ai + A 2 - pcricr 2 ) ' 



(157) 



(158) 
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The last equation clearly shows that if the input noise sources are corre- 
lated (p 7^ 0), the average value of the output will be different from the value 
when there is no correlation (p = 0). As shown in the other types of noise, 
positive correlations increase the mean, and negative correlations reduce it. 

The variance can be computed using the same methods. First, we will 
calculate the expected value of a product of different instances of a standard 
Wiener process. 

Lemma 6. If ti, ti, . . . t n G M + is an ordered set of times such that ti < 
t 2 < . . . < t n and G\ , <t 2 . . . <7 n G 1R + are arbitrary positive numbers denoting 
standard deviations, then 





n 




E 




= exp 




.k=l 





2^2 \ ^2 am ) ( tk ~ t k-i) 

k=l \m=k / 



(159) 



where W t is the standard Wiener process. 



Proof. For each tk, we decompose the Wiener process W tk as a sum of inde- 
pendent processes: 



(160) 



m=l 



Based on the sum above, we can write 



n 

k=i 



,<JkW t , 



exp 



exp 



= exp 



k=i 

n k 

k=l m=l 
n n 



k=i 



m=k 



(161) 



k=i 



(^-^-OE^ 



rn=k 



where in the last equation, we changed the order of summation making use 
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of the triangle rule. All terms in the last product are independent: 



E 



n 

k=l 



E 



n 

JJexp 

k=l 



k=l 

n 

JJexp 

k=l 



cxp 



m=k 
n 

m=k 

2 (** ~ *fc-i) ( °" fc ) 

\m=fe / 

2 S ( S afc J ( tfe ~ tfc - 1 ) 

fc=l \m=fc / 



(162) 



□ 



When one of the inputs is affected by multiplicative noise, and the other 
by additive noise, the mean value of the output is not affected, even if the 
driving noise is the same in both cases. If consider again the chemical reaction 
(140), the differential equation in that case becomes 

^ = -aY + jT e-^+£>f w -dxj (x 2 + aj* e-^dW, 



The input is equal to 

u(t) = (xtXi e~ {Xl+ ^ )x e aW *dx^j (x 2 + a 

and its expected value is 

E[u{t)] = AiXiXs f e" (Al+ ^ )x E [e aW *] dx 
Jo 

t ft „ 



-Mt-y) dW 



(163) 
(164) 



o jo 



(165) 



+ a\ 1 X 1 / e- iXl+s r )x e- X2{t - y) E[e aW *dW y ]dx. 



In order to compute the second term of the last equation, we will need the 
following Lemma about the expected value of the product an exponential 
Wiener process with an infinitesimal difference of the same process. 
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Lemma 7. If W t is a standard Wiener process, then 

if s < t 

E [e aWs dW t ] = { (166) 

a 2 e^ s dt if s > t. 

Proof If s < t, then W s and dW t = W t +dt — W t are uncorrelated, so 

E [e aWs dW t ] = E [e uWa ] E [dW t ] =0. (167) 
Now, if < a < b < s, then 

E [e aW °(W b - W a )} = E [e aWa ] E [e a{Wb ~ Wa) (W b - W a )] E [ e < w ^ w ^] 

= a 2 e^ 2s (b-a). 

(168) 

Setting a = t and b = t + dt, we get the desired result. □ 
Recalling equation (165), 



E[u(t)} = X 1 X 1 X 2 jf e" AlX cfe + a 3 AiX ie - A2 ' jf ^ jf e^V 22 ^ rfs 
= X 1 X 2 (l-e- Al *) +a 3 X! e 



\Jy 

t(A 1+ A 2 ) ( Al (! _ e tA 3 ) _ ^ _ e tAij A J 



A2 (Ai — A2) 

(169) 

As time t grows large, 

lim E [«(*)] = XxXs (170) 

t— >oo 

and the mean of the output is 

E[Y] = -X^ (171) 
a 

which is exactly the same as in the case where the two noise inputs are com- 
pletely uncorrelated. So, input noise correlation does not affect the average 
concentration of the output in this case. 

This section has analyzed how noise propagates in an arbitrary chemical 
reaction network where one or more inputs include a random component. 
The different noise sources may have arbitrary correlations with each other. 
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We have studied the propagation of both additive and multiplicative noise. 
One of the main results is that even if all noise sources have mean equal to 
zero, their correlations shift the mean of the outputs, for both types of noise. 
If there is positive correlation, the mean of the output increases, and when 
the correlation is negative, it shifts lower, and the same is true for the output 
variance. 

9 Conclusions 

We have shown how noise propagates in networks and how a network's noisy 
parameters can affect its output. Since many biological networks are locally 
tree-like, we have studied how noise propagates in the absence of feedfor- 
ward or feedback cycles. Tree networks are relatively easy to quantitatively 
analyze, since there is only one path from each node to another. We have 
derived a method to compute the variance of the output of any tree network, 
and shown that the variance is minimized when there are no "bottlenecks" 
in each pathway, in other words when there is no rate limiting step. When 
a network is not a tree, there are cycles, which means that a signal (along 
with its noise) can propagate through two or more paths towards the output. 
Feedback cycles typically reduce the output variance, and feedforward cycles 
increase it. When the noise sources are correlated, the variance in the output 
is larger, and small cycles have a stronger influence on the output, compared 
to longer cycles in both cases. Delays contribute to the decrease of the out- 
put noise when we have two or more noise sources, since their correlation 
is diminished. Crosstalk is also shown to decrease the output variance, but 
the tradeoff is that the output mean is lowered, or the concentration of the 
inputs needs to be proportionally higher in order to ensure the same output. 
In biological and chemical reaction networks, the reaction rates are prone 
to noise, since they depend on the concentration of other species. When 
the degradation rates are affected by noise, the result is increased output 
variance, which also depends on the concentration of the respective species, 
and the form of the output is different from when the noise is in the inputs, 
in the sense that higher concentrations also correspond to larger deviations 
from the mean. Finally, we have extensively studied how noise propagates 
through chemical reaction networks where one or more of the reactants are 
noisy, and their disturbances may be correlated. Even when the disturbances 
have zero average, correlations change the output mean, and variance. 
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